Threshold Exceedance Model

The stationary ThresholdExceedance model is illustrated using the daily rainfall accumulations at a location in south-west England from 1914 to 1962. This dataset was studied by Coles (2001) in Chapter 4. The daily rainfall are assume independent and identically distributed.

The Extremes.jl package supports maximum likelihood inference, Bayesian inference and inference based on the probability weigthed moments. For the GEV parameter estimation, the following functions can be used:

  • gpfitpwm: estimation with the probability weighted moments;
  • gpfit: maximum likelihood estimation;
  • gpfitbayes: Bayesian estimation.
Log-scale paremeter

These functions return the estimate of the log-scale parameter $\phi = \log \sigma$.

Load the data

Loading the daily precipitations:

data = Extremes.dataset("rain")
first(data,5)

5 rows × 2 columns

DateRainfall
DateFloat64
11914-01-010.0
21914-01-022.3
31914-01-031.3
41914-01-046.9
51914-01-054.6

Plotting the data using the Gadfly package:

plot(data, x=:Date, y=:Rainfall, Geom.point, Theme(discrete_highlight_color=c->nothing))
Date Jan 1, 1840 1850 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2030 2040 Jan 1, 1800 1850 1900 1950 2000 2050 Jan 1, 1800 1850 1900 1950 2000 2050 Jan 1, 1800 1850 1900 1950 2000 2050 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 -100 0 100 200 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 Rainfall

Threshold selection

A suitable threshold for the Peaks-Over-Threshold model can be chosen by examining the mean residual life plot. The mean residual life is expected to be a linear function of the threshold when the latter is high enough. The mean residual life plot can be plotted with the mrlplot function:

mrlplot(data[:,:Rainfall])
Threshold -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 -100 0 100 200 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -40 -30 -20 -10 0 10 20 30 40 50 60 70 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 -30 0 30 60 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Mean Residual Life

As concluded by Coles (2001, Chapter 4), a reasonable threshold is 30 mm.

threshold = 30.0

Exceedances extraction

Parameter estimation of the Generalized Pareto distribution in Extremes.jl is performed using the threshold exceedances previously extracted. The support of the exceedances given in the fit function is therefore $(0,∞)$.

For the Rainfall example, let's extract the threshold exceedances.

Identify first the threshold exceedances:

threshold = 30.0
df = filter(row -> row.Rainfall > threshold, data)
first(df, 5)

5 rows × 2 columns

DateRainfall
DateFloat64
11914-02-0731.8
21914-03-0832.5
31914-12-1731.8
41914-12-3044.5
51915-02-1330.5

Retrieve the exceedances above the threshold:

df[!,:Rainfall] =  df[!,:Rainfall] .- threshold
rename!(df, :Rainfall => :Exceedance)
first(df, 5)

5 rows × 2 columns

DateExceedance
DateFloat64
11914-02-071.8
21914-03-082.5
31914-12-171.8
41914-12-3014.5
51915-02-130.5

Maximum likelihood inference

GP parameter estimation

The Generalized Pareto maximum likelihood parameter estimates are computed with the gpfit function:

julia> fm = gpfit(df, :Exceedance)MaximumLikelihoodEVA
model :
	ThresholdExceedance
	data :		Vector{Float64}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[2.006896498380506, 0.1844926991237574]
Note

In this example, the gpfit function is called using the data DataFrame structure as the first argument. The function can also be called using the vector of maxima as the first argument, e.g. gpfit(df[:,:Exceedance]).

The vector of the parameter estimates $\hat\mathbf{\theta} = (ϕ̂,\, ξ̂)^\top$ is contained in the field θ̂ of the structure fm:<fittedEVA.

The approximate covariance matrix of the parameter estimates can be obtained with the parametervar function:

julia> parametervar(fm)2×2 Matrix{Float64}:
  0.0165972   -0.00880429
 -0.00880429   0.0102416

Confidence intervals for the parameters are obtained with the cint function:

julia> cint(fm)2-element Vector{Vector{Float64}}:
 [1.7543940197850578, 2.2593989769759544]
 [-0.013857525987188729, 0.38284292423470356]

For instance, the shape parameter 95% confidence interval is as follows:

julia> cint(fm)[2]2-element Vector{Float64}:
 -0.013857525987188729
  0.38284292423470356

Diagnostic plots

Several diagnostic plots for assessing the accuracy of the fitted GP distribution to the rainfall data are can be shown with the diagnosticplots function:

set_default_plot_size(21cm ,16cm)
diagnosticplots(fm)
Return Period 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 10-2.5 100.0 102.5 105.0 10-2.5 10-2.4 10-2.3 10-2.2 10-2.1 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 103.1 103.2 103.3 103.4 103.5 103.6 103.7 103.8 103.9 104.0 104.1 104.2 104.3 104.4 104.5 104.6 104.7 104.8 104.9 105.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 -100 0 100 200 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 Return Level Return Level Plot Data -200 -150 -100 -50 0 50 100 150 200 250 300 350 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 -200 0 200 400 -105 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 -0.2 0.0 0.2 0.4 -0.155 -0.150 -0.145 -0.140 -0.135 -0.130 -0.125 -0.120 -0.115 -0.110 -0.105 -0.100 -0.095 -0.090 -0.085 -0.080 -0.075 -0.070 -0.065 -0.060 -0.055 -0.050 -0.045 -0.040 -0.035 -0.030 -0.025 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0.090 0.095 0.100 0.105 0.110 0.115 0.120 0.125 0.130 0.135 0.140 0.145 0.150 0.155 0.160 0.165 0.170 0.175 0.180 0.185 0.190 0.195 0.200 0.205 0.210 0.215 0.220 0.225 0.230 0.235 0.240 0.245 0.250 0.255 0.260 0.265 0.270 0.275 0.280 0.285 0.290 0.295 0.300 0.305 Density Density Plot Model -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 -100 0 100 200 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 -100 0 100 200 -60 -58 -56 -54 -52 -50 -48 -46 -44 -42 -40 -38 -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120 Empirical Quantile Plot Model -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 -1 0 1 2 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 -1 0 1 2 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 Empirical Probability Plot

The diagnostic plots consist in the probability plot (upper left panel), the quantile plot (upper right panel), the density plot (lower left panel) and the return level plot (lower right panel). These plots can be displayed separately using respectively the probplot, qqplot, histplot and returnlevelplot functions.

Return level estimation

T-year return level estimate can be obtained using the returnlevel function. Along with the fitted Generalized Pareto distribution for the threshold exceedances, the following informations:

  • the threshold;
  • the number of total observation;
  • the number of observation per year.

are also needed for estimating the T-year return level using the POT model.

For example, the 100-year return level for the rainfall POT model is computed as follows:

julia> nobs = size(data,1)17531
julia> nobsperblock = 365365
julia> r = returnlevel(fm, threshold, nobs, nobsperblock, 100)ReturnLevel returnperiod : 100 value : Vector{Float64}[1]

The return level can be accessed as follows:

julia> r.value1-element Vector{Float64}:
 106.32558691303024

The corresponding confidence interval can be computed with the cint function:

julia> c = cint(r)1-element Vector{Vector{Real}}:
 [65.4816377433487, 147.16953608271177]
Type-stable function

In this example of a stationary model, the function returns a unit dimension vector for the return level and a vector containing only one vector for the confidence interval. The reason is that the function always returns the same type in the stationary and non-stationary case. The function is therefore type-stable allowing better performance of code execution.

To get the scalar return level in the stationary case, the following command can be used:

julia> r.value[]106.32558691303024

To get the scalar confidence interval in the stationary case, the following command can be used:

julia> c[]2-element Vector{Real}:
  65.4816377433487
 147.16953608271177

Bayesian Inference

Most functions described in the previous sections also work in the Bayesian context.

GP parameter estimation

The Bayesian GP parameter estimation is performed with the gpfitbayes function:

julia> fm = gpfitbayes(df, :Exceedance)
Progress:  14%|█████▊                                   |  ETA: 0:00:01
Progress:  28%|███████████▌                             |  ETA: 0:00:01
Progress:  44%|██████████████████                       |  ETA: 0:00:00
Progress:  60%|████████████████████████▊                |  ETA: 0:00:00
Progress:  76%|███████████████████████████████▍         |  ETA: 0:00:00
Progress:  93%|██████████████████████████████████████▏  |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00
BayesianEVA
model :
	ThresholdExceedance
	data :		Vector{Float64}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

sim :
	Mamba.Chains
	Iterations :		2001:5000
	Thinning interval :	1
	Chains :		1
	Samples per chain :	3000
	Value :			Array{Float64, 3}[3000,2,1]
Prior

Currently, only the improper uniform prior is implemented, i.e. \[ f_{(ϕ,ξ)}(ϕ,ξ) ∝ 1. \] It yields to a proper posterior as long as the sample size is larger than 2 (Northrop and Attalides, 2016).

Sampling scheme

Currently, the No-U-Turn Sampler extension (Hoffman and Gelman, 2014) to Hamiltonian Monte Carlo (Neel, 2011, Chapter 5) is implemented for simulating an autocorrelated sample from the posterior distribution.

The generated sample from the posterior distribution is contained in the field sim of the fitted structure. It is an object of type Chains from the Mamba.jl package.

Credible intervals for the parameters are obtained with the cint function:

julia> cint(fm)2-element Vector{Vector{Float64}}:
 [1.7293406835670435, 2.2288752619090455]
 [0.02816900829546612, 0.42238312497413855]

Inference based on the probability weighted moments

Most functions described in the previous sections also work for the model fitted with the probability weighted moments.

GP parameter estimation

The parameter estimation based on the probability weighted moments is performed with the gpfitpwm function:

julia> fm = gpfitpwm(df, :Exceedance)pwmEVA
model :
	ThresholdExceedance
	data :		Vector{Float64}[152]
	logscale :	ϕ ~ 1
	shape :		ξ ~ 1

θ̂  :	[1.9877399514951732, 0.19651587232938317]

The approximate covariance matrix of the parameter estimates using a bootstrap procedure can be obtained with the parametervar function:

julia> parametervar(fm)2×2 Matrix{Float64}:
  0.016004    -0.00631361
 -0.00631361   0.00640124

Confidence intervals on the parameter estimates using a bootstrap procedure can be obtained with the cint function:

julia> cint(fm)2-element Vector{Vector{Float64}}:
 [1.754844711230875, 2.2369744777770424]
 [0.01181055440696415, 0.32892351562426025]